68 research outputs found
String Indexing with Compressed Patterns
Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern
Differentially Private Approximate Pattern Matching
In this paper, we consider the -approximate pattern matching problem under
differential privacy, where the goal is to report or count all substrings of a
given string which have a Hamming distance at most to a pattern , or
decide whether such a substring exists. In our definition of privacy,
individual positions of the string are protected. To be able to answer
queries under differential privacy, we allow some slack on , i.e. we allow
reporting or counting substrings of with a distance at most
to , for a multiplicative error and an
additive error . We analyze which values of and are
necessary or sufficient to solve the -approximate pattern matching problem
while satisfying -differential privacy. Let denote the length of
. We give 1) an -differentially private algorithm with an additive
error of and no multiplicative error for the existence
variant; 2) an -differentially private algorithm with an additive
error for the counting variant; 3)
an -differentially private algorithm with an additive error of
and multiplicative error for the reporting
variant for a special class of patterns. The error bounds hold with high
probability. All of these algorithms return a witness, that is, if there exists
a substring of with distance at most to , then the algorithm returns
a substring of with distance at most to . Further,
we complement these results by a lower bound, showing that any algorithm for
the existence variant which also returns a witness must have an additive error
of with constant probability.Comment: This is a full version of a paper accepted to ITCS 202
Gapped Indexing for Consecutive Occurrences
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting queries (return all positions where the pattern occurs), and counting queries (return the number of occurrences of the pattern). In this paper we consider a variant of string indexing, where the goal is to compactly represent the string such that given two patterns P? and P? and a gap range [?, ?] we can quickly find the consecutive occurrences of P? and P? with distance in [?, ?], i.e., pairs of subsequent occurrences with distance within the range. We present data structures that use O?(n) space and query time O?(|P?|+|P?|+n^{2/3}) for existence and counting and O?(|P?|+|P?|+n^{2/3}occ^{1/3}) for reporting. We complement this with a conditional lower bound based on the set intersection problem showing that any solution using O?(n) space must use ??(|P?| + |P?| + ?n) query time. To obtain our results we develop new techniques and ideas of independent interest including a new suffix tree decomposition and hardness of a variant of the set intersection problem
String Indexing for Top-k Close Consecutive Occurrences
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string P, report all occurrences of P within S. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-k close consecutive occurrences problem (Sitcco). Here, a consecutive occurrence is a pair (i,j), i < j, such that P occurs at positions i and j in S and there is no occurrence of P between i and j, and their distance is defined as j-i. Given a pattern P and a parameter k, the goal is to report the top-k consecutive occurrences of P in S of minimal distance. The challenge is to compactly represent S while supporting queries in time close to the length of P and k. We give two time-space trade-offs for the problem. Let n be the length of S, m the length of P, and ? ? (0,1]. Our first result achieves O(nlog n) space and optimal query time of O(m+k), and our second result achieves linear space and query time O(m+k^{1+?}). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees
Compressed Indexing for Consecutive Occurrences
The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact occurrence of a given pattern P. However, practical applications motivate the necessity of considering more complex queries, for example concerning near occurrences of two patterns. Recently, Bille et al. [CPM 2021] introduced a variant of such queries, called gapped consecutive occurrences, in which a query consists of two patterns P? and P? and a range [a,b], and one must find all consecutive occurrences (q?,q?) of P? and P? such that q?-q? ? [a,b]. By their results, we cannot hope for a very efficient indexing structure for such queries, even if a = 0 is fixed (although at the same time they provided a non-trivial upper bound). Motivated by this, we focus on a text given as a straight-line program (SLP) and design an index taking space polynomial in the size of the grammar that answers such queries in time optimal up to polylog factors
String Indexing for Top- Close Consecutive Occurrences
The classic string indexing problem is to preprocess a string into a
compact data structure that supports efficient subsequent pattern matching
queries, that is, given a pattern string , report all occurrences of
within . In this paper, we study a basic and natural extension of string
indexing called the string indexing for top- close consecutive occurrences
problem (SITCCO). Here, a consecutive occurrence is a pair , ,
such that occurs at positions and in and there is no occurrence
of between and , and their distance is defined as . Given a
pattern and a parameter , the goal is to report the top- consecutive
occurrences of in of minimal distance. The challenge is to compactly
represent while supporting queries in time close to length of and .
We give two time-space trade-offs for the problem. Let be the length of
, the length of , and . Our first result achieves
space and optimal query time of , and our second result
achieves linear space and query time . Along the way, we
develop several techniques of independent interest, including a new translation
of the problem into a line segment intersection problem and a new recursive
clustering technique for trees.Comment: Fixed typos, minor change
Reciprocity between narrative, questioning and imagination in the early and primary years: examining the role of narrative in possibility thinking
The concept of Possibility Thinking (PT) as a driving force of creativity has been investigated both conceptually and empirically for over a decade in early years settings and primary classrooms in England. In the first wave of qualitative empirical studies, play formed part of the enabling context. Criteria for episode selection for PT analysis were that episodes exhibited children immersed in sustained focused playful activity. During the second wave of PT studies, the research teamâs attention was drawn to childrenâs imaginative storying in such playful contexts and it emerged that consideration of narrative in PT might prove fruitful. The current paper revisits key published work, and drawing on data previously analysed for features of PT, seeks to explore how narrative might relate to the current theorised framework. Fourteen published PT episodes are re-analysed in order to consider the role and construction of narrative in PT. The new analysis reveals that narrative plays a foundational role in PT, and that reciprocal relationships exist between questioning, imagination and narrative, layered between children and adults. Consequences for nurturing childrenâs creativity and for future PT research are explored
Effect of natalizumab on disease progression in secondary progressive multiple sclerosis (ASCEND). a phase 3, randomised, double-blind, placebo-controlled trial with an open-label extension
Background: Although several disease-modifying treatments are available for relapsing multiple sclerosis, treatment effects have been more modest in progressive multiple sclerosis and have been observed particularly in actively relapsing subgroups or those with lesion activity on imaging. We sought to assess whether natalizumab slows disease progression in secondary progressive multiple sclerosis, independent of relapses. Methods: ASCEND was a phase 3, randomised, double-blind, placebo-controlled trial (part 1) with an optional 2 year open-label extension (part 2). Enrolled patients aged 18â58 years were natalizumab-naive and had secondary progressive multiple sclerosis for 2 years or more, disability progression unrelated to relapses in the previous year, and Expanded Disability Status Scale (EDSS) scores of 3·0â6·5. In part 1, patients from 163 sites in 17 countries were randomly assigned (1:1) to receive 300 mg intravenous natalizumab or placebo every 4 weeks for 2 years. Patients were stratified by site and by EDSS score (3·0â5·5 vs 6·0â6·5). Patients completing part 1 could enrol in part 2, in which all patients received natalizumab every 4 weeks until the end of the study. Throughout both parts, patients and staff were masked to the treatment received in part 1. The primary outcome in part 1 was the proportion of patients with sustained disability progression, assessed by one or more of three measures: the EDSS, Timed 25-Foot Walk (T25FW), and 9-Hole Peg Test (9HPT). The primary outcome in part 2 was the incidence of adverse events and serious adverse events. Efficacy and safety analyses were done in the intention-to-treat population. This trial is registered with ClinicalTrials.gov, number NCT01416181. Findings: Between Sept 13, 2011, and July 16, 2015, 889 patients were randomly assigned (n=440 to the natalizumab group, n=449 to the placebo group). In part 1, 195 (44%) of 439 natalizumab-treated patients and 214 (48%) of 448 placebo-treated patients had confirmed disability progression (odds ratio [OR] 0·86; 95% CI 0·66â1·13; p=0·287). No treatment effect was observed on the EDSS (OR 1·06, 95% CI 0·74â1·53; nominal p=0·753) or the T25FW (0·98, 0·74â1·30; nominal p=0·914) components of the primary outcome. However, natalizumab treatment reduced 9HPT progression (OR 0·56, 95% CI 0·40â0·80; nominal p=0·001). In part 1, 100 (22%) placebo-treated and 90 (20%) natalizumab-treated patients had serious adverse events. In part 2, 291 natalizumab-continuing patients and 274 natalizumab-naive patients received natalizumab (median follow-up 160 weeks [range 108â221]). Serious adverse events occurred in 39 (13%) patients continuing natalizumab and in 24 (9%) patients initiating natalizumab. Two deaths occurred in part 1, neither of which was considered related to study treatment. No progressive multifocal leukoencephalopathy occurred. Interpretation: Natalizumab treatment for secondary progressive multiple sclerosis did not reduce progression on the primary multicomponent disability endpoint in part 1, but it did reduce progression on its upper-limb component. Longer-term trials are needed to assess whether treatment of secondary progressive multiple sclerosis might produce benefits on additional disability components. Funding: Biogen
Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD
Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p 10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10â392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group
- âŠ